Share the data visualizations you found online with the people sitting near you.
What kind of data would you need to recreate them?
Aesthetics: so what?
Aesthetics (such as color, size, shape, etc.) determine how data points are visually distinguished in a plot.
For example:
Democrats vs. Republicans
Scales: so what?
Scales control how data is mapped onto visual dimensions like the x- and y-axes.
Proper scaling can prevent misleading representations.
Code
# Load necessary librarieslibrary(ggplot2)library(gridExtra)library(tidyverse)library(knitr)# Dummy datadata <-data.frame(year =c(2010, 2011, 2012, 2013, 2014, 2015),interest_rate =c(3.5, 3.7, 3.6, 3.8, 3.9, 4.0))# Plot 1: With a narrow y-axisp1 <-ggplot(data, aes(x = year, y = interest_rate)) +geom_line(color ="blue", linewidth =1) +geom_point(color ="blue", size =3) +scale_y_continuous(limits =c(3.4, 4.1))# Plot 2: With a broader y-axisp2 <-ggplot(data, aes(x = year, y = interest_rate)) +geom_line(color ="red", linewidth =1) +geom_point(color ="red", size =3) +scale_y_continuous(limits =c(0, 5))# Arrange plots side by sidegrid.arrange(p1, p2, ncol =2)
pokemon data
Code
pokemon <-read_csv("../data/pokemon.csv")# take a look at the data:pokemon |>head() |>kable()
pokedex_no
name
form
type_1
type_2
stat_total
hp
attack
defense
sp_attack
sp_defense
speed
generation
1
Bulbasaur
NA
Grass
Poison
318
45
49
49
65
65
45
1
2
Ivysaur
NA
Grass
Poison
405
60
62
63
80
80
60
1
3
Venusaur
NA
Grass
Poison
525
80
82
83
100
100
80
1
4
Charmander
NA
Fire
NA
309
39
52
43
60
50
65
1
5
Charmeleon
NA
Fire
NA
405
58
64
58
80
65
80
1
6
Charizard
NA
Fire
Flying
534
78
84
78
109
85
100
1
Aesthetics & Scales with Pokémon
The highest defense and hp is in the top-right by default:
Code
pokemon |>ggplot() +geom_point(aes(x = defense, y = hp))
Modifying scales
Let’s suppose we wanted to flip that and see the Pokemon with the highest defense and lowesthp in the top-right corner.
Code
pokemon |>ggplot() +geom_point(aes(x = defense, y = hp)) +# reverse the y-axisscale_y_reverse()
Combining scale_, aes, & geom_
Who has low hp and high defense?
Code
pokemon |>ggplot() +geom_point(aes(x = defense, y = hp)) +scale_y_reverse() +# new:geom_text(aes(x = defense, y = hp, label = name))
Limiting scales
Code
pokemon |>ggplot() +geom_point(aes(x = defense, y = hp)) +scale_y_reverse() +# repel the text labels:geom_text_repel(aes(x = defense, y = hp, label = name)) +# limit the x-axis to `defense` of 150 or more:# `NA` ("Not Available") is a missing value indicator.# We use it here to say that there is no upper limit on the x-axis.scale_x_continuous(limits =c(150, NA))
Increasing n.breaks
Code
pokemon |>ggplot() +geom_point(aes(x = defense, y = hp)) +scale_y_reverse() +geom_text_repel(aes(x = defense, y = hp, label = name)) +# make it easier to identify the precise values of `defense`:scale_x_continuous(limits =c(150, NA), n.breaks =30)
Color
We can use color to see patterns in the data by variables
e.g., Are there relationships between type_1, defense, and hp?
We’re also going to filter for first generation Pokemon to reduce the number of points.
Color by type_1
Code
pokemon |>filter(generation ==1) |>ggplot() +geom_point(aes(x = defense, y = hp, color = type_1)) +geom_text_repel(aes(x = defense, y = hp, label = name))
Custom color
Let’s use colors associated with 🔥, 🍃, and 💧 Pokemon:
Code
pokemon |>filter(generation ==1) |>filter(type_1 %in%c("Water", "Fire", "Grass")) |>ggplot() +geom_point(aes(x = defense, y = hp, color = type_1)) +geom_text_repel(aes(x = defense, y = hp, label = name)) +# use the `type_1` colors instead of the default:scale_color_manual(values =c(Water ="blue",Fire ="red",Grass ="green" ))
scale_color
Mewtwo has a high stat_total:
Code
pokemon |>filter(generation ==1) |>ggplot() +# color the points by `stat_total` instead of `type1`:geom_point(aes(x = defense, y = hp, color = stat_total)) +# use the `viridis` color palette instead of the default:scale_color_viridis_c() +geom_text_repel(aes(x = defense, y = hp, label = name))
size
Magikarp has a low stat_total:
Code
pokemon |>filter(generation ==1) |># just water pokemonfilter(type_1 =="Water") |>ggplot() +# new: `size` by `stat_total`geom_point(aes(x = defense, y = hp, size = stat_total)) +geom_text_repel(aes(x = defense, y = hp, label = name))
Combine size and color
Code
pokemon |>filter(generation ==1) |># just psychic pokemonfilter(type_1 =="Psychic") |>ggplot() +# new: `color` by `stat_total`, toogeom_point(aes(x = defense, y = hp, size = stat_total, color = stat_total)) +# use the `viridis` color palette instead of the default:scale_color_viridis_c() +geom_text_repel(aes(x = defense, y = hp, label = name))
Combining color and shape
Code
pokemon |># filter for first genfilter(generation ==1) |># filter for a few typesfilter(type_1 %in%c("Normal", "Rock", "Bug", "Poison")) |>ggplot() +geom_point(aes(x = defense,y = hp,# new: shape points by `type_1`shape = type_1,# color points by `stat_total`color = stat_total )) +scale_color_viridis_c() +geom_text_repel(aes(x = defense, y = hp, label = name))
Bonus: facet-ing plots
Code
# faceting allows us to split a plot into multiple panels based on a factor# maintaining the scales makes them directly comparablepokemon |># exclude top 1% of stat_total to see better color distribution:filter(stat_total <quantile(stat_total, 0.99)) |>ggplot() +geom_point(aes(x = defense, y = hp, color = stat_total)) +scale_color_viridis_c() +# new: `~` means "by", so we are saying "facet wrap by `type_1`"facet_wrap(~type_1)
Summary
Aesthetics determine how data points are visually distinguished, including aspects like color, size, and shape.
Scales control how data is mapped onto visual dimensions such as x- and y-axes. Proper scaling ensures that visualizations are easy to interpret and not misleading.
Manipulating both aesthetics and scales can reveal patterns and/or outliers in data.
Preserving scales on faceted plots can make them directly comparable.